-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative stat API #599
Alternative stat API #599
Conversation
8844bbc
to
ea7a60d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uring 0.7 is out now. I pushed an update for that to this PR.
lib_eio_linux/low_level.ml
Outdated
@@ -366,6 +375,33 @@ let mkdir_beneath ~perm dir path = | |||
try eio_mkdirat parent leaf perm | |||
with Unix.Unix_error (code, name, arg) -> raise @@ Err.wrap_fs code name arg | |||
|
|||
let float_of_time (s, ns) = Int64.to_float s +. (float ns /. 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCaml itself makes sure the integer part isn't changed by rounding (ocaml/ocaml#9490). We should probably do that too (here and in eio_posix).
lib_eio_posix/low_level.mli
Outdated
@@ -36,7 +36,8 @@ val send_msg : fd -> ?fds:fd list -> ?dst:Unix.sockaddr -> Cstruct.t array -> in | |||
val getrandom : Cstruct.t -> unit | |||
|
|||
val fstat : fd -> Unix.LargeFile.stats | |||
val lstat : string -> Unix.LargeFile.stats | |||
val lstat : string -> Unix.LargeFile.stats (* TODO: remove? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to remove this now.
I'm just playing around with Patrick's help on a revised version of this interface that can minimise allocations for stats at the Eio level as well.
With this, the stat call takes in a continuation which receives just the specified fields, and the Linux backend can then translate that into a statx while the posix/other backends just do a full stat. I'm just plumbing it all through to see what it looks like. |
@avsm I've just read the suggested patch. While I am personally familiar with both GADTs and defining custom list types, because both of those things "leak" into the Yes, with a good example or two it would be fine. But it would mean forcing most users to refer to the docs anytime they want to use Since the stats call is used in a wide range of use cases, what about offering |
I agree with the concerns about the obscurity of GADTs, but really do think that the default interface shouldn't be quite as inefficient as a complete retrieval of all the statistics, most of which are promptly thrown away. Browsing through sherlocode for Unix.stat shows that a majority of the uses are of the form "(Unix.stat).field" to retrieve just one value. We could just provide single-field accessors (for which |
I think that's a great compromise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With statx support and the others, do you think it would also be good to provide a Path.exists
that uses the statx whilst we're here?
lib_eio/path.mli
Outdated
(** [stat ~follow t] returns metadata about the file [t]. | ||
|
||
If [t] is a symlink, the information returned is about the target if [follow = true], | ||
otherwise it is about the link itself. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the concerns about the GADT in the API, perhaps a small example in the documentation could help here?
otherwise it is about the link itself. *) | |
otherwise it is about the link itself. | |
For example, if you want to get a file's size along with what kind of file it is, you | |
can write: | |
{[ | |
let kind, size = | |
Path.stat ~follow:true path File.Stat.[ kind; size ] (fun k s -> (k, s)) | |
in | |
Eio.traceln "kind: %a, size: %Ld" File.Stat.pp_kind kind size | |
]} | |
*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 28169b5
Sorry for the slight bikeshedding, but do you think it would be an improvement to move the accessor functions into a
I'd like to add my support for this suggestion. Otherwise it's something that a lot of users will cobble toghether, sometimes poorly by using |
I'm playing around with rearranging the accessor functions in the module so that the advanced ones are at the bottom of the interface rather than the top. That makes the simpler ones much more discoverable... In the meanwhile though, the blocker to adding |
let exists ~follow path =
try ignore (stat ~follow path : File.Stat.t); true
with Exn.Io (Fs.E Not_found _, _) -> false |
Take 3 pushed; I've:
Still need:
Another round of bikeshedding on this version welcome! :-) |
lib_eio_linux/eio_linux.mli
Outdated
@@ -153,8 +153,11 @@ module Low_level : sig | |||
val await_writable : fd -> unit | |||
(** [await_writable fd] blocks until [fd] is writable (or has an error). *) | |||
|
|||
val fstat : fd -> Eio.File.Stat.t | |||
(** Like {!Unix.LargeFile.fstat}. *) | |||
val statx : ?buf:Uring.Statx.t -> ?fd:fd -> ?flags:Uring.Statx.Flags.t -> string -> ('a, 'b) Eio.File.stats -> 'a -> 'b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so clear on whether I hid dir_fd
right here... need to double check.
lib_eio_linux/low_level.ml
Outdated
ctime = ust.st_ctime; | ||
} | ||
with Unix.Unix_error (code, name, arg) -> raise @@ Err.wrap_fs code name arg | ||
let float_of_time s ns = Int64.to_float s +. (float ns /. 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs the rounding fix @talex5 pointed out in an earlier review.
lib_eio_posix/eio_posix_stubs.c
Outdated
@@ -161,6 +163,68 @@ CAMLprim value caml_eio_posix_mkdirat(value v_fd, value v_path, value v_perm) { | |||
CAMLreturn(Val_unit); | |||
} | |||
|
|||
static double double_of_timespec(struct timespec *t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, @patricoferris @talex5 the obvious optimisation I just realised here is that we can also just pass around a stat
buf in the Posix case, and have field accessors for it as noalloc just as we do for Eio_linux. This would avoid the unconditional copy into OCaml land for every field, and make the backends much more similar. The only difference then would be that the kernel is adding more fields into the stat
buffer than requested when using Posix.
I haven't pushed it here just yet, but I have a WIP |
Somewhat unexpectedly, hacking in an Eio.Pool to reuse statx buffers in the invocation of Eio_linux.Low_level.stat seems to consistently slow down @patricoferris' benchmark by 10% or so, and allocate more memory.
(This version just printfs to make sure that the pool isn't freeing/reallocation, which it isn't. The right number of statx buffers are indeed allocated). Not had a chance to investigate further. |
Just a quick update to say I've consistently had issues with uring and big statting programs on a 5.15.0 kernel (it manifests as a |
I added a benchmark for fstat (#616) to see the effect of using the GADT vs a record. It seems to make no significant difference to performance, even when I remove the actual call to However, using uring for fstat is about 27x slower on my machine (Linux 6.1.52). Most of the time is spent in the kernel, and I think uring is pushing it to a worker thread. So we should probably leave that using the |
Interesting, sorry for dropping the ball a little on this PR. I'm not sure if it is related at all, but I noticed considerably slower code in https://github.com/patricoferris/aperitif when comparing Eio and the plain OCaml sequential code (confusingly under |
I'm not sure what to do with this PR. I'd like to get stat support in soon, but it's unclear if the change to a GADT is an improvement. Some possible problems:
I wonder if we should merge it now using the original type, and then add an optional argument later to say you only want some of the fields? In theory the current layout is a bit inefficient (with the |
I'm ok with unblocking this PR with the current interface, but there are some important considerations we should keep in mind for the next rev that we've learnt from this.
|
Yes, we need to work out what the minimum should be. I'm on 6.1 and I haven't seen that error yet. I'm working on a minimal patch to support stat with the current record system here: https://github.com/ocaml-multicore/eio/compare/main...talex5:eio:stat?expand=1 (will open a PR soon). Would be good to get some benchmarks indeed. I did some quick tests recently and it looked like using the GADT with lots of fields was slower than using the record, while using it with a single field was no faster. Not sure why, though. |
Co-authored-by: Thomas Leonard <talex5@gmail.com> Co-authored-by: Patrick Ferris <patrick@sirref.org>
I squashed and rebased this on top of #620, so it contains just the bits missing from that:
|
Probably related to this change in 5.18: io-uring: Make statx API stable (from @SGrondin's list at https://gist.github.com/SGrondin/be12fae9e4851fb864bc7d5de9297f1a). Fixed by #624. |
Closing for now, as most of this PR got merged and it's unclear it we want the other bits at all. |
(this PR was originally adding
stat
support, but that got split out and merged separately)This is mostly a draft until ocaml-multicore/ocaml-uring#95 is merged and released. But this is a continuation of https://github.com/talex5/eio/tree/stat adding support for linux. I'll see what I can do Windows...
This will also be useful for #594 as a fast "exists" check I think.